{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "# Dimensionality Reduction: Principal component analysis\n",
    "\n",
    "__Content creators:__ Alex Cayco Gajic, John Murray\n",
    "\n",
    "__Content modified:__ Kai Chen"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "---\n",
    "# Tutorial Objectives\n",
    "\n",
    "In this notebook we'll learn how to perform PCA by projecting the data onto the eigenvectors of its covariance matrix. \n",
    "\n",
    "Overview:\n",
    "- Calculate the eigenvectors of the sample covariance matrix.\n",
    "- Perform PCA by projecting data onto the eigenvectors of the covariance matrix. \n",
    "- Plot and explore the eigenvalues.\n",
    "\n",
    "To quickly refresh your knowledge of eigenvalues and eigenvectors, you can watch this [short video](https://www.youtube.com/watch?v=kwA3qM0rm7c) (4 minutes) for a geometrical explanation. For a deeper understanding, this [in-depth video](https://www.youtube.com/watch?v=PFDu9oVAE-g&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab&index=14) (17 minutes) provides an excellent basis and is beautifully illustrated."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "---\n",
    "# Setup\n",
    "Run these cells to get the tutorial started."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab_type": "code",
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# Imports\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab_type": "code",
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# @title Figure Settings\n",
    "import ipywidgets as widgets  # interactive display\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "\n",
    "nma_style = {\n",
    "    'figure.figsize' : (8, 6),\n",
    "    'figure.autolayout' : True,\n",
    "    'font.size' : 15,\n",
    "    'xtick.labelsize' : 'small',\n",
    "    'ytick.labelsize' : 'small',\n",
    "    'legend.fontsize' : 'small',\n",
    "    'axes.spines.top' : False,\n",
    "    'axes.spines.right' : False,\n",
    "    'xtick.major.size' : 5,\n",
    "    'ytick.major.size' : 5,\n",
    "}\n",
    "for key, value in nma_style.items():\n",
    "    plt.rcParams[key] = value\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab_type": "code",
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# @title Helper functions\n",
    "\n",
    "\n",
    "def plot_eigenvalues(evals):\n",
    "  \"\"\"\n",
    "  Plots eigenvalues.\n",
    "\n",
    "  Args:\n",
    "      (numpy array of floats) : Vector of eigenvalues\n",
    "\n",
    "  Returns:\n",
    "    Nothing.\n",
    "\n",
    "  \"\"\"\n",
    "  plt.figure(figsize=(4, 4))\n",
    "  plt.plot(np.arange(1, len(evals) + 1), evals, 'o-k')\n",
    "  plt.xlabel('Component')\n",
    "  plt.ylabel('Eigenvalue')\n",
    "  plt.title('Scree plot')\n",
    "  plt.xticks(np.arange(1, len(evals) + 1))\n",
    "  plt.ylim([0, 2.5])\n",
    "\n",
    "\n",
    "def sort_evals_descending(evals, evectors):\n",
    "  \"\"\"\n",
    "  Sorts eigenvalues and eigenvectors in decreasing order. Also aligns first two\n",
    "  eigenvectors to be in first two quadrants (if 2D).\n",
    "\n",
    "  Args:\n",
    "    evals (numpy array of floats)    : Vector of eigenvalues\n",
    "    evectors (numpy array of floats) : Corresponding matrix of eigenvectors\n",
    "                                        each column corresponds to a different\n",
    "                                        eigenvalue\n",
    "\n",
    "  Returns:\n",
    "    (numpy array of floats)          : Vector of eigenvalues after sorting\n",
    "    (numpy array of floats)          : Matrix of eigenvectors after sorting\n",
    "  \"\"\"\n",
    "\n",
    "  index = np.flip(np.argsort(evals))\n",
    "  evals = evals[index]\n",
    "  evectors = evectors[:, index]\n",
    "  if evals.shape[0] == 2:\n",
    "    if np.arccos(np.matmul(evectors[:, 0],\n",
    "                           1 / np.sqrt(2) * np.array([1, 1]))) > np.pi / 2:\n",
    "      evectors[:, 0] = -evectors[:, 0]\n",
    "    if np.arccos(np.matmul(evectors[:, 1],\n",
    "                           1 / np.sqrt(2) * np.array([-1, 1]))) > np.pi / 2:\n",
    "      evectors[:, 1] = -evectors[:, 1]\n",
    "  return evals, evectors\n",
    "\n",
    "\n",
    "def plot_data(X):\n",
    "  \"\"\"\n",
    "  Plots bivariate data. Includes a plot of each random variable, and a scatter\n",
    "  scatter plot of their joint activity. The title indicates the sample\n",
    "  correlation calculated from the data.\n",
    "\n",
    "  Args:\n",
    "    X (numpy array of floats) : Data matrix each column corresponds to a\n",
    "                                different random variable\n",
    "\n",
    "  Returns:\n",
    "    Nothing.\n",
    "  \"\"\"\n",
    "\n",
    "  fig = plt.figure(figsize=[8, 4])\n",
    "  gs = fig.add_gridspec(2, 2)\n",
    "  ax1 = fig.add_subplot(gs[0, 0])\n",
    "  ax1.plot(X[:, 0], color='k')\n",
    "  plt.ylabel('Neuron 1')\n",
    "  ax2 = fig.add_subplot(gs[1, 0])\n",
    "  ax2.plot(X[:, 1], color='k')\n",
    "  plt.xlabel('Sample Number (sorted)')\n",
    "  plt.ylabel('Neuron 2')\n",
    "  ax3 = fig.add_subplot(gs[:, 1])\n",
    "  ax3.plot(X[:, 0], X[:, 1], '.', markerfacecolor=[.5, .5, .5],\n",
    "           markeredgewidth=0)\n",
    "  ax3.axis('equal')\n",
    "  plt.xlabel('Neuron 1 activity')\n",
    "  plt.ylabel('Neuron 2 activity')\n",
    "  plt.title('Sample corr: {:.1f}'.format(np.corrcoef(X[:, 0], X[:, 1])[0, 1]))\n",
    "  plt.show()\n",
    "\n",
    "\n",
    "def get_data(cov_matrix):\n",
    "  \"\"\"\n",
    "  Returns a matrix of 1000 samples from a bivariate, zero-mean Gaussian\n",
    "\n",
    "  Note that samples are sorted in ascending order for the first random\n",
    "  variable.\n",
    "\n",
    "  Args:\n",
    "    var_1 (scalar)                     : variance of the first random variable\n",
    "    var_2 (scalar)                     : variance of the second random variable\n",
    "    cov_matrix (numpy array of floats) : desired covariance matrix\n",
    "\n",
    "  Returns:\n",
    "    (numpy array of floats)            : samples from the bivariate Gaussian,\n",
    "                                          with each column corresponding to a\n",
    "                                          different random variable\n",
    "  \"\"\"\n",
    "\n",
    "  mean = np.array([0, 0])\n",
    "  X = np.random.multivariate_normal(mean, cov_matrix, size=1000)\n",
    "  indices_for_sorting = np.argsort(X[:, 0])\n",
    "  X = X[indices_for_sorting, :]\n",
    "  return X\n",
    "\n",
    "\n",
    "def calculate_cov_matrix(var_1, var_2, corr_coef):\n",
    "  \"\"\"\n",
    "  Calculates the covariance matrix based on the variances and\n",
    "  correlation coefficient.\n",
    "\n",
    "  Args:\n",
    "    var_1 (scalar)         :  variance of the first random variable\n",
    "    var_2 (scalar)         :  variance of the second random variable\n",
    "    corr_coef (scalar)     :  correlation coefficient\n",
    "\n",
    "  Returns:\n",
    "    (numpy array of floats) : covariance matrix\n",
    "  \"\"\"\n",
    "  cov = corr_coef * np.sqrt(var_1 * var_2)\n",
    "  cov_matrix = np.array([[var_1, cov], [cov, var_2]])\n",
    "  return cov_matrix\n",
    "\n",
    "\n",
    "def define_orthonormal_basis(u):\n",
    "  \"\"\"\n",
    "  Calculates an orthonormal basis given an arbitrary vector u.\n",
    "\n",
    "  Args:\n",
    "    u (numpy array of floats) : arbitrary 2D vector used for new basis\n",
    "\n",
    "  Returns:\n",
    "    (numpy array of floats)   : new orthonormal basis columns correspond to\n",
    "                                basis vectors\n",
    "  \"\"\"\n",
    "\n",
    "  u = u / np.sqrt(u[0] ** 2 + u[1] ** 2)\n",
    "  w = np.array([-u[1], u[0]])\n",
    "  W = np.column_stack((u, w))\n",
    "  return W\n",
    "\n",
    "\n",
    "def plot_data_new_basis(Y):\n",
    "  \"\"\"\n",
    "  Plots bivariate data after transformation to new bases. Similar to plot_data\n",
    "  but with colors corresponding to projections onto basis 1 (red) and\n",
    "  basis 2 (blue).\n",
    "  The title indicates the sample correlation calculated from the data.\n",
    "\n",
    "  Note that samples are re-sorted in ascending order for the first random\n",
    "  variable.\n",
    "\n",
    "  Args:\n",
    "    Y (numpy array of floats) : Data matrix in new basis each column\n",
    "                                corresponds to a different random variable\n",
    "\n",
    "  Returns:\n",
    "    Nothing.\n",
    "  \"\"\"\n",
    "\n",
    "  fig = plt.figure(figsize=[8, 4])\n",
    "  gs = fig.add_gridspec(2, 2)\n",
    "  ax1 = fig.add_subplot(gs[0, 0])\n",
    "  ax1.plot(Y[:, 0], 'r')\n",
    "  plt.ylabel('Projection \\n basis vector 1')\n",
    "  ax2 = fig.add_subplot(gs[1, 0])\n",
    "  ax2.plot(Y[:, 1], 'b')\n",
    "  plt.xlabel('Projection \\n basis vector 1')\n",
    "  plt.ylabel('Projection \\n basis vector 2')\n",
    "  ax3 = fig.add_subplot(gs[:, 1])\n",
    "  ax3.plot(Y[:, 0], Y[:, 1], '.', color=[.5, .5, .5])\n",
    "  ax3.axis('equal')\n",
    "  plt.xlabel('Projection basis vector 1')\n",
    "  plt.ylabel('Projection basis vector 2')\n",
    "  plt.title('Sample corr: {:.1f}'.format(np.corrcoef(Y[:, 0], Y[:, 1])[0, 1]))\n",
    "  plt.show()\n",
    "\n",
    "\n",
    "def change_of_basis(X, W):\n",
    "  \"\"\"\n",
    "  Projects data onto a new basis.\n",
    "\n",
    "  Args:\n",
    "    X (numpy array of floats) : Data matrix each column corresponding to a\n",
    "                                different random variable\n",
    "    W (numpy array of floats) : new orthonormal basis columns correspond to\n",
    "                                basis vectors\n",
    "\n",
    "  Returns:\n",
    "    (numpy array of floats)   : Data matrix expressed in new basis\n",
    "  \"\"\"\n",
    "\n",
    "  Y = np.matmul(X, W)\n",
    "  return Y\n",
    "\n",
    "\n",
    "def plot_basis_vectors(X, W):\n",
    "  \"\"\"\n",
    "  Plots bivariate data as well as new basis vectors.\n",
    "\n",
    "  Args:\n",
    "    X (numpy array of floats) : Data matrix each column corresponds to a\n",
    "                                different random variable\n",
    "    W (numpy array of floats) : Square matrix representing new orthonormal\n",
    "                                basis each column represents a basis vector\n",
    "\n",
    "  Returns:\n",
    "    Nothing.\n",
    "  \"\"\"\n",
    "\n",
    "  plt.figure(figsize=[4, 4])\n",
    "  plt.plot(X[:, 0], X[:, 1], '.', color=[.5, .5, .5], label='Data')\n",
    "  plt.axis('equal')\n",
    "  plt.xlabel('Neuron 1 activity')\n",
    "  plt.ylabel('Neuron 2 activity')\n",
    "  plt.plot([0, W[0, 0]], [0, W[1, 0]], color='r', linewidth=3,\n",
    "           label='Basis vector 1')\n",
    "  plt.plot([0, W[0, 1]], [0, W[1, 1]], color='b', linewidth=3,\n",
    "           label='Basis vector 2')\n",
    "  plt.legend()\n",
    "  plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "---\n",
    "# Section 1: Calculate the eigenvectors of the the sample covariance matrix\n",
    "\n",
    "As we saw in the lecture, PCA represents data in a new orthonormal basis defined by the eigenvectors of the covariance matrix. Remember that in the previous tutorial, we generated bivariate normal data with a specified covariance matrix $\\bf \\Sigma$, whose $(i,j)$th element is:\n",
    "\\begin{equation}\n",
    "\\Sigma_{ij} = E[ x_i x_j ] - E[ x_i] E[ x_j ] .\n",
    "\\end{equation}\n",
    "However, in real life we don't have access to this ground-truth covariance matrix. To get around this, we can use the sample covariance matrix, $\\bf\\hat\\Sigma$, which is calculated directly from the data. The $(i,j)$th element of the sample covariance matrix is:\n",
    "\\begin{equation}\n",
    " \\hat \\Sigma_{ij} =  \\frac{1}{N_\\text{samples}}{\\bf x}_i^T {\\bf x}_j - \\bar {\\bf x}_i \\bar{\\bf x}_j ,\n",
    "\\end{equation}\n",
    "where ${\\bf x}_i = [ x_i(1), x_i(2), \\dots,x_i(N_\\text{samples})]^T$ is a column vector representing all measurements of neuron $i$, and  $\\bar {\\bf x}_i$ is the mean of neuron $i$ across samples:\n",
    "\\begin{equation}\n",
    "\\bar {\\bf x}_i = \\frac{1}{N_\\text{samples}} \\sum_{k=1}^{N_\\text{samples}} x_i(k).\n",
    "\\end{equation}\n",
    "If we assume that the data has already been mean-subtracted, then we can write the sample covariance matrix in a much simpler matrix form:\n",
    "\\begin{align}\n",
    "{\\bf \\hat \\Sigma}\n",
    "&= \\frac{1}{N_\\text{samples}} {\\bf X}^T {\\bf X}.\n",
    "\\end{align}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "## Exercise 1: Calculation of the covariance matrix\n",
    "\n",
    "Before calculating the eigenvectors, you must first calculate the sample covariance matrix.\n",
    "\n",
    "**Steps**\n",
    "* Complete the function `get_sample_cov_matrix` by first subtracting the sample mean of the data, then calculate $\\bf \\hat \\Sigma$ using the equation above.\n",
    "* Use `get_data` to generate bivariate normal data, and calculate the sample covariance matrix with your finished `get_sample_cov_matrix`. Compare this estimate to the true covariate matrix using `calculate_cov_matrix`. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab_type": "code",
    "execution": {}
   },
   "outputs": [],
   "source": [
    "help(get_data)\n",
    "help(calculate_cov_matrix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab_type": "code",
    "execution": {}
   },
   "outputs": [],
   "source": [
    "def get_sample_cov_matrix(X):\n",
    "  \"\"\"\n",
    "  Returns the sample covariance matrix of data X\n",
    "\n",
    "  Args:\n",
    "    X (numpy array of floats) : Data matrix each column corresponds to a\n",
    "                                different random variable\n",
    "\n",
    "  Returns:\n",
    "    (numpy array of floats)   : Covariance matrix\n",
    "  \"\"\"\n",
    "\n",
    "  #################################################\n",
    "  ## TODO for students: calculate the covariance matrix\n",
    "  # Fill out function and remove\n",
    "  raise NotImplementedError(\"Student excercise: calculate the covariance matrix!\")\n",
    "  #################################################\n",
    "\n",
    "  # Subtract the mean of X\n",
    "  X = ...\n",
    "  # Calculate the covariance matrix (hint: use np.matmul)\n",
    "  cov_matrix = ...\n",
    "\n",
    "  return cov_matrix\n",
    "\n",
    "\n",
    "##########################################################\n",
    "## TODO for students: generate bivariate Gaussian data\n",
    "# with variances of 1 and a correlation coefficient of 0.8\n",
    "# compare the true and sample covariance matrices\n",
    "##########################################################\n",
    "variance_1 = 1\n",
    "variance_2 = 1\n",
    "corr_coef = 0.8\n",
    "\n",
    "np.random.seed(2020)  # set random seed\n",
    "# Uncomment below code to test your function\n",
    "# cov_matrix = calculate_cov_matrix(variance_1, variance_2, corr_coef)\n",
    "# print(cov_matrix)\n",
    "\n",
    "# X = get_data(cov_matrix)\n",
    "# sample_cov_matrix = get_sample_cov_matrix(X)\n",
    "# print(sample_cov_matrix)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "SAMPLE OUTPUT\n",
    "```\n",
    "[[1.  0.8]\n",
    " [0.8 1. ]]\n",
    "[[0.99315313 0.82347589]\n",
    " [0.82347589 1.01281397]]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/NeoNeuron/professional-workshop-3/tree/master//tutorials/W3_DimensionalityReduction/solutions/W3_Tutorial2_Solution_a98b14b7.py)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "## Exercise 2: Eigenvectors of the Covariance matrix\n",
    "\n",
    "Next you will calculate the eigenvectors of the covariance matrix. Plot them along with the data to check that they align with the geometry of the data.\n",
    "\n",
    "**Steps:**\n",
    "* Calculate the eigenvalues and eigenvectors of the sample covariance matrix. (**Hint:** use `np.linalg.eigh`, which finds the eigenvalues of a symmetric matrix).\n",
    "* Use the provided code to sort the eigenvalues in descending order.\n",
    "* Plot the eigenvectors on a scatter plot of the data, using the function `plot_basis_vectors`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab_type": "code",
    "execution": {}
   },
   "outputs": [],
   "source": [
    "help(sort_evals_descending)\n",
    "help(plot_basis_vectors)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab_type": "code",
    "execution": {}
   },
   "outputs": [],
   "source": [
    "#################################################\n",
    "## TO DO for students: Calculate and sort the eigenvalues in descending order\n",
    "#################################################\n",
    "# Calculate the eigenvalues and eigenvectors\n",
    "# evals, evectors = ...\n",
    "# Sort the eigenvalues in descending order\n",
    "# evals, evectors = ...\n",
    "\n",
    "# plot_basis_vectors(X, evectors)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/NeoNeuron/professional-workshop-3/tree/master//tutorials/W3_DimensionalityReduction/solutions/W3_Tutorial2_Solution_9834df3f.py)\n",
    "\n",
    "*Example output:*\n",
    "\n",
    "<img alt='Solution hint' align='left' width=539.0 height=540.0 src=https://raw.githubusercontent.com/NeoNeuron/professional-workshop-3/master/tutorials/W3_DimensionalityReduction/static/W3_Tutorial2_Solution_9834df3f_0.png>\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "---\n",
    "# Section 2: Perform PCA by projecting data onto the eigenvectors\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "To perform PCA, we will project the data onto the eigenvectors of the covariance matrix, i.e.:\n",
    "\\begin{equation}\n",
    "\\bf S = X W\n",
    "\\end{equation}\n",
    "where $\\bf S$ is an $N_\\text{samples} \\times N$ matrix representing the projected data (also called *scores*), and $W$ is an $N\\times N$ orthogonal matrix, each of whose columns represents the eigenvectors of the covariance matrix (also called *weights* or *loadings*). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "## Exercise 3: PCA implementation\n",
    "\n",
    "You will now perform PCA on the data using the intuition you have developed so far. Fill in the function below to carry out the steps to perform PCA by projecting the data onto the eigenvectors of its covariance matrix.\n",
    "\n",
    "**Steps:**\n",
    "* First subtract the mean.\n",
    "* Then calculate the sample covariance matrix.\n",
    "* Then find the eigenvalues and eigenvectors and sort them in descending order.\n",
    "* Finally project the mean-centered data onto the eigenvectors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab_type": "code",
    "execution": {}
   },
   "outputs": [],
   "source": [
    "help(change_of_basis)\n",
    "help(plot_data_new_basis)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab_type": "code",
    "execution": {}
   },
   "outputs": [],
   "source": [
    "def pca(X):\n",
    "  \"\"\"\n",
    "  Sorts eigenvalues and eigenvectors in decreasing order.\n",
    "\n",
    "  Args:\n",
    "    X (numpy array of floats): Data matrix each column corresponds to a\n",
    "                               different random variable\n",
    "\n",
    "  Returns:\n",
    "    (numpy array of floats)  : Data projected onto the new basis\n",
    "    (numpy array of floats)  : Vector of eigenvalues\n",
    "    (numpy array of floats)  : Corresponding matrix of eigenvectors\n",
    "\n",
    "  \"\"\"\n",
    "\n",
    "  #################################################\n",
    "  ## TODO for students: calculate the covariance matrix\n",
    "  # Fill out function and remove\n",
    "  raise NotImplementedError(\"Student excercise: sort eigenvalues/eigenvectors!\")\n",
    "  #################################################\n",
    "\n",
    "  # Subtract the mean of X\n",
    "  X = ...\n",
    "  # Calculate the sample covariance matrix\n",
    "  cov_matrix = ...\n",
    "  # Calculate the eigenvalues and eigenvectors\n",
    "  evals, evectors = ...\n",
    "  # Sort the eigenvalues in descending order\n",
    "  evals, evectors = ...\n",
    "  # Project the data onto the new eigenvector basis\n",
    "  score = ...\n",
    "\n",
    "  return score, evectors, evals\n",
    "\n",
    "\n",
    "#################################################\n",
    "## TODO for students: Call the function to calculate the eigenvectors/eigenvalues\n",
    "#################################################\n",
    "\n",
    "# Perform PCA on the data matrix X\n",
    "# score, evectors, evals = ...\n",
    "# Plot the data projected into the new basis\n",
    "# plot_data_new_basis(score)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/NeoNeuron/professional-workshop-3/tree/master//tutorials/W3_DimensionalityReduction/solutions/W3_Tutorial2_Solution_429cc93e.py)\n",
    "\n",
    "*Example output:*\n",
    "\n",
    "<img alt='Solution hint' align='left' width=1116.0 height=540.0 src=https://raw.githubusercontent.com/NeoNeuron/professional-workshop-3/master/tutorials/W3_DimensionalityReduction/static/W3_Tutorial2_Solution_429cc93e_0.png>\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "## Plot and explore the eigenvalues\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "   \n",
    "Finally, we will examine the eigenvalues of the covariance matrix. Remember that each eigenvalue describes the variance of the data projected onto its corresponding eigenvector. This is an important concept because it allows us to rank the PCA basis vectors based on how much variance each one can capture. First run the code below to plot the eigenvalues (sometimes called the \"scree plot\"). Which eigenvalue is larger?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab_type": "code",
    "execution": {}
   },
   "outputs": [],
   "source": [
    "plot_eigenvalues(evals)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "## Interactive Demo: Exploration of the correlation coefficient\n",
    "\n",
    "Run the following cell and use the slider to change the correlation coefficient in the data. You should see the scree plot and the plot of basis vectors update.\n",
    "\n",
    "**Questions:**\n",
    "* What happens to the eigenvalues as you change the correlation coefficient?\n",
    "* Can you find a value for which both eigenvalues are equal?\n",
    "* Can you find a value for which only one eigenvalue is nonzero?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "colab_type": "code",
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# @title\n",
    "\n",
    "# @markdown Make sure you execute this cell to enable the widget!\n",
    "\n",
    "\n",
    "def refresh(corr_coef=.8):\n",
    "  cov_matrix = calculate_cov_matrix(variance_1, variance_2, corr_coef)\n",
    "  X = get_data(cov_matrix)\n",
    "  score, evectors, evals = pca(X)\n",
    "  plot_eigenvalues(evals)\n",
    "  plot_basis_vectors(X, evectors)\n",
    "\n",
    "\n",
    "_ = widgets.interact(refresh, corr_coef=(-1, 1, .1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/NeoNeuron/professional-workshop-3/tree/master//tutorials/W3_DimensionalityReduction/solutions/W3_Tutorial2_Solution_1227c45a.py)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "---\n",
    "# Summary\n",
    "- In this tutorial, we learned that goal of PCA is to find an orthonormal basis capturing the directions of maximum variance of the data. More precisely, the $i$th basis vector is the direction that maximizes the projected variance, while being orthogonal to all previous basis vectors. Mathematically, these basis vectors are the eigenvectors of the covariance matrix (also called *loadings*). \n",
    "- PCA also has the useful property that the projected data (*scores*) are uncorrelated.\n",
    "- The projected variance along each basis vector is given by its corresponding eigenvalue. This is important because it allows us rank the \"importance\" of each basis vector in terms of how much of the data variability it explains. An eigenvalue of zero means there is no variation along that direction so it can be dropped without losing any information about the original data.\n",
    "- In the next tutorial, we will use this property to reduce the dimensionality of high-dimensional data.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Notation\n",
    "\n",
    "\\begin{align}\n",
    "\\mathbf{x_i}  &\\quad \\text{all measurements of neuron } i\\\\\n",
    "\\bar{\\bf x_i}  &\\quad \\text{mean across samples for neuron } i \\\\\n",
    "\\bf \\Sigma &\\quad \\text{covariance matrix}\\\\\n",
    "\\bf \\hat \\Sigma &\\quad \\text{sample covariance matrix}\\\\\n",
    "\\bf W &\\quad \\text{weights, loadings matrix}\\\\\n",
    "{\\bf X} &\\quad \\text{original data matrix}\\\\\n",
    "\\bf S &\\quad \\text{projected matrix, scores}\\\\\n",
    "N &\\quad \\text{data dimensionality}\\\\\n",
    "N_\\text{samples} &\\quad \\text{number of samples}\\\\\n",
    "\\end{align}"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "include_colab_link": true,
   "name": "W3_Tutorial2",
   "provenance": [],
   "toc_visible": true
  },
  "interpreter": {
   "hash": "63d742608f992112235285d8014d3ad9bd22d914fb5c109d9df6e126fac219ce"
  },
  "kernel": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "kernelspec": {
   "display_name": "Python 3.7.8 64-bit ('base': conda)",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
